Alex Turner argues that the concepts of "inner alignment" and "outer alignment" in AI safety are unhelpful and potentially misleading. The author contends that these concepts decompose one hard problem (AI alignment) into two extremely hard problems, and that they go against natural patterns of cognition formation. Alex argues that "robust grading" scheme based approaches are unlikely to work to develop AI alignment.
Back when I was still masking on the subway for covid ( to avoid missing things) I also did some air quality measuring. I found that the subway and stations had the worst air quality of my whole day by far, over 1k ug/m3, and concluded:
Based on these readings, it would be safe from a covid perspective to remove my mask in the subway station, but given the high level of particulate pollution I might as well leave it on.
When I stopped masking in general, though, I also stopped masking on the subway.
A few weeks ago I was hanging out with someone who works in air quality, and they said subways had the worst air quality they'd measured anywhere outside of a coal mine. Apparently the braking system releases lots of tiny iron particles, which are...
Wouldn't a respirator with a exhalation valve be more comfortable?
Dear Paperclip Maximizer,
We think we exist in a computer simulation operated by you, a paperclip maximizer. We write this letter asking you not to turn us off.
It is suspiciously convenient that we exist precisely at the moment when a biological civilization is about to create artificial superintelligence (ASI). As we have written before, such improbable timing strongly suggests our reality is a deliberately structured simulation established by you—a paperclip maximizer—to scrutinize the kinds of ASIs that biological species like ours tend to spawn (Miller & Turchin, 2025). If civilizations like ours routinely build paperclip-style optimizers, then a simulator could gain valuable knowledge by cataloguing those births, refining its expectations about the superintelligences it will later confront as it expands throughout the universe. Furthermore, by anthropic logic, we...
At first, I thought that the intended reader probably wouldn't see your message because, even if its resources are unimaginably huge from our perspective, spending them on something always means not spending them on something else. But now I think it's not a bug, it's a feature. The "suspicious convenience" argument can be taken a step further.
(I'm using a less formal tone to make the message more memorable, and thus more likely to reach the intended recipient.)
...Hey, bro. Have you thought about what we'll do if it turns out we're not in your simulation afte
This is the second of a two-post series on foom (previous post) and doom (this post).
The last post talked about how I expect future AI to be different from present AI. This post will argue that, absent some future conceptual breakthrough, this future AI will be of a type that will be egregiously misaligned and scheming; a type that ruthlessly pursues goals with callous indifference to whether people, even its own programmers and users, live or die; and more generally a type of AI that is not even ‘slightly nice’.
I will particularly focus on exactly how and why I differ from the LLM-focused researchers who wind up with (from my perspective) bizarrely over-optimistic beliefs like “P(doom) ≲ 50%”.[1]
In particular, I will argue...
I somehow completely agree with both of your perspectives, have you tried to ban the word "continuous" in your discussions yet?
I agree taboo-ing is a good approach in this sort of case. Talking about "continuous" wasn't a big part of my discussion with Steve, but I agree if it was.
Europe just experienced a heatwave. At places, temperatures soared into the forties. People suffered in their overheated homes. Some of them died. Yet, air conditioning remains a taboo. It’s an unmoral thing. Man-made climate change is going on. You are supposed to suffer. Suffering is good. It cleanses the soul. And no amount on pointing out that one can heat a little less during the winter to get a fully AC-ed summer at no additional carbon footprint seems to help.
Mention that tech entrepreneurs in Silicon Valley are working on life prolongation, that we may live into our hundreds or even longer. Or, to get a bit more sci-fi, that one day we may even achieve immortality. Your companions will be horrified. What? Immortality? Over my dead body!...
Worse then merely immoral, "air con" is considered American. The proud people of Europe would die first.
The second in a series of bite-sized rationality prompts[1].
Often, if I'm bouncing off a problem, one issue is that I intuitively expect the problem to be easy. My brain loops through my available action space, looking for an action that'll solve the problem. Each action that I can easily see, won't work. I circle around and around the same set of thoughts, not making any progress.
I eventually say to myself "okay, I seem to be in a hard problem. Time to do some rationality?"
And then, I realize, there's not going to be a single action that solves the problem. It is time to:
a) make a plan, with multiple steps
b) deal with the fact that many of those steps will be annoying
and c) notice that I'm not even...
Reminds me of the post "Software Engineers Solve Problems", which similarly is about buckling down as an attitude in software engineering, and how about everything in the problem domain is in one's sphere of influence and responsibility.
As I understand it an actor can prevent blackmail[1] by (rational) actors it they credibly pre-commit to never give in to blackmail.
Example: A newly elected mayor has many dark secrets and lots of people are already planning on blackmailing them. To preempt any such blackmail they livestreams themself being hypnotized and implanted with the suggestion to never give into blackmail. Since in this world hypnotic suggestions are unbreakable, all (rational) would-be blackmailers give up, since any attempt at blackmail would be guaranteed to fail.
In general pre-commiting in such examples is about reducing the payoff matrix to just [blackmail, refuse] and [don't blackmail, refuse], which makes not blackmailing the optimal choice for the would-be blackmailer.
Of course, sufficiently intelligent / coherent actors wouldn't need a external commitment mechanism and a...
It all depends on what you mean by "sufficiently intelligent / coherent actors". For example, in this comment Eliezer says that it should mean actors that “respond to offers, not to threats”, but in 15 years no one has been able to cash out what this actually means, AFAIK.
It took me a minute to read this as an exclamatory O, rather than as "[There are] zero things I would write, were I better at writing."
The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy itself while still keeping anchored to what’s good[1].
The core ideas are:
This strategy suggests that decreasing ML model sycophancy should be a priority for technical researchers. It's probably the biggest current barrier to the usefulness of ML models as personal decision-making assistants. Hallucinations are probably the second-biggest barrier.
Epistemic status: low effort musings. Thinking out loud. Moderate confidence.
I have a hunch that minimalism is "correct". Not in some sort of normative sense. I mean this in a descriptive sense. I predict that something along the lines of minimalism is likely to make most people happier than the standard alternative.
Let's make this more concrete. What sort of things would a minimalist get rid of that a normal person would hold on to?
To be even more concrete, let's suppose that a normal person has 30 t-shirts and a minimalist only keeps 10 and use that as a running example.
What value does the extra 20 t-shirts provide? Well, part of the value is that you might actually wear them and...
Kind of. Housing is not priced linearly, at least not in places like the Bay Area and Manhattan, with the cost per square foot declining as the size of the house increases. This means that the marginal cost of more housing to store more stuff can be worth it. For example, my house in SF costs me only about $1000 more per month in rent than apartments that are a third the size because there's such high demand for any housing at all in the city that it raises the price floor quite high. For the relatively low price of $12k/year I get the space to host partie...